Spelling-checking for Highly Inflective Languages
نویسندگان
چکیده
Spelling-checkers have become an integral part of most text processing software. From different reasons among which the speed of processing prevails they are usually based on dictionaries of word forms instead of words. This approach is sufficient for languages with little inflection such as English, but fails for highly inflective languages such as Czech, Russian, Slovak or other Slavonic languages. We have developed a special method for describing inflection for the purpose of building spelling-checkers for such languages. The speed of the resulting program lies somewhere in the middle of the scale of existing spelling-checkers for English and the main dictionary fits into the standard 360K floppy, whereas the number of recognized word forms exceeds 6 million (for Czech). Further, a special method has been developed for easy word classification.
منابع مشابه
Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language
We address the problem of statistical machine translation from highly inflective language to less inflective one. The characteristics of inflective languages are generally not taken into account by the statistical machine translation system. Existing translation systems often treat different inflected word forms of the same lemma as if they were independent of each other, although some interdep...
متن کاملFrom Phoneme to Grapheme: Diagnosis in a Dictation
The main difficulty for the elaboration of a dictation system consists in modeling the errors and the associated explanations provided to the learner. On these bases, an experimental DICTOR system is being developed as an assistant tool to learn French language spelling. DICTOR includes an automatic checking tool based on a stochastic alignment algorithm and French written linguistic knowledge....
متن کاملCombination of a hidden tag model and a traditional n-gram model: a case study in czech speech recognition
A speech recognition system targeting high inflective languages is described that combines the traditional trigram language model and an HMM tagger, obtaining results superior to the trigram language model itself. An experiment in speech recognition of Czech has been performed with promising results. 1. Speech Recognition of Inflective Languages Inflective languages pose a hard problem in speec...
متن کاملSpell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language
There are several native languages in Peru which are mostly agglutinative. These languages are transmitted from generation to generation mainly in oral form, causing different forms of writing across different communities. For this reason, there are recent efforts to standardize the spelling in the written texts, and it would be beneficial to support these tasks with an automatic tool such as a...
متن کاملSpell Checking Techniques in NLP: A Survey
Spell checkers in Indian languages are the basic tools that need to be developed. A spell checker is a software tool that identifies and corrects any spelling mistakes in a text. Spell checkers can be combined with other applications or they can be distributed individually. In this paper the authors are discussing both the approaches and their roles in various applications.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1990